-
Notifications
You must be signed in to change notification settings - Fork 100
Support of strides in the convolutional layers #239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Yes, it's exactly how I would do it, thanks Jeremie. |
|
I think it is a good implementation. Thanks for your job |
|
I'd like to offer my support whenever it is needed, feel free to contact me, right now I'm busy to develop improvements on my own but I can cooperate with someone else |
| self % gradient(k,iws:iwe,jws:jwe) = self % gradient(k,iws:iwe,jws:jwe) & | ||
| + gdz(n,iws:iwe,jws:jwe) * self % kernel(n,k,1:iwe-iws+1,1:jwe-jws+1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Riccardo231 Could you check this, please? I think it has a different behaviour now. However, I am not sure what was the goal before the change, because all entries of self % gradient were not updated (that is only the entries between istart:iend and jstart:jend were updated).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I implemented the conv2d variant.. I'll look at this carefully over the weekend. It's possible that the original code was bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have a look tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, just to inform you these days I'm really busy with school. Can probably have a look after the 5th. Sorry for the delay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, just to inform you these days I'm really busy with school. Can probably have a look after the 5th. Sorry for the delay
No worries. Whenever you have time. Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, sadly I have been missing some days more than expected. However, now I am ready to commit to the project. Do I still have to take a look or was it already done? Thank you and sorry for the delay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Riccardo231, no problems. If you have time, it would be nice if you would take a look at this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, just had a look, ran a test with cnn_mnist.f90, didn't modify any parameter, old and new version both converge to 80% accuracy within a 10 epoch range. Your changes look good to me but I didn't take a deep dive, however feel free to tell me whenever it's needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi all, I appreciate your patience with this.
Based on my comment ! dL/dx = dL/dy * sigma'(z) .inner. w, assuming it is correct based on my understanding of how backward pass of convolutional layers works, this should be an inner product (sum of element-wise products), rather than simply element-wise products added element-wise to the gradient.
It is true that the original implementation would make the edges of the gradient not updated, but I think this is merely a consequence of summing the kernel width to a scalar result.
So, I think the previous code was correct, meaning, we need the full inner product (including the sum to get the scalar result), not just element-wise product and assign.
If I'm correct, it means that Riccardo's conv1d backward pass should be updated to reflect this.
Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right. So, should it be something like:
| self % gradient(k,iws:iwe,jws:jwe) = self % gradient(k,iws:iwe,jws:jwe) & | |
| + gdz(n,iws:iwe,jws:jwe) * self % kernel(n,k,1:iwe-iws+1,1:jwe-jws+1) | |
| self % gradient(k,i,j) = self % gradient(k,i,j) & | |
| + sumgdz(n,iws:iwe,jws:jwe) * self % kernel(n,k,1:iwe-iws+1,1:jwe-jws+1)) |
If correct, conv1d must be revised too, as well as locally_connected.
|
@milancurcic @Riccardo231 Pending a comment/question, this PR is ready for review and/or to be merged. |
Proposal to support strides in the convolutional layers
@Riccardo231 @milancurcic does this approach make sense? If yes, I will continue its implementation.